The TreeBanker: a Tool for Supervised Training of Parsed Corpora

نویسنده

David M. Carter

چکیده

I describe the TreeBanker, a graphical tool for the supervised training involved in domain customization of the disambiguation component of a speechor languageunderstanding system. The TreeBanker presents a user, who need not be a system expert, with a range of properties that distinguish competing analyses for an utterance and that are relatively easy to judge. This allows training on a corpus to be completed in far less time, and with far less expertise, than would be needed if analyses were inspected directly: it becomes possible for a corpus of about 20,000 sentences of the complexity of those in the ATIS corpus to be judged in around three weeks of work by a linguistically aware non-expert.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Journées ATALA, 18–19 juin 1999, Corpus annotés pour la syntaxe SYNTACTIC ANNOTATION OF A GERMAN NEWSPAPER CORPUS

Data-oriented and corpus-based methods have become one of the most important areas of applied as well as theoretical NLP. Currently, the methods prevailingly belong to the supervised learning paradigm, i.e., they require as training material large corpora annotated with linguistic information. Since the preparation of such corpora usually involves manual human work, a lot of effort is put into ...

متن کامل

Syntactic-Based Methods for Measuring Word Similarity

This paper explores different strategies for extracting similarity relations between words from partially parsed text corpora. The strategies we have analysed do not require supervised training nor semantic information available from general lexical resources. They differ in the amount and the quality of the syntactic contexts against which words are compared. The paper presents in details the ...

متن کامل

Automatic Selection of High Quality Parses Created By a Fully Unsupervised Parser

The average results obtained by unsupervised statistical parsers have greatly improved in the last few years, but on many specific sentences they are of rather low quality. The output of such parsers is becoming valuable for various applications, and it is radically less expensive to create than manually annotated training data. Hence, automatic selection of high quality parses created by unsup...

متن کامل

Supervised Grammar Induction using Training Data with Limited Constituent Information

Corpus-based grammar induction generally relies on hand-parsed training data to learn the structure of the language. Unfortunately, the cost of building large annotated corpora is prohibitively expensive. This work aims to improve the induction strategy when there are few labels in the training data. We show that the most informative linguistic constituents are the higher nodes in the parse tre...

متن کامل

Learning Graph Walk Based Similarity Measures for Parsed Text

We consider a parsed text corpus as an instance of a labelled directed graph, where nodes represent words and weighted directed edges represent the syntactic relations between them. We show that graph walks, combined with existing techniques of supervised learning, can be used to derive a task-specific word similarity measure in this graph. We also propose a new path-constrained graph walk meth...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره cmp-lg/9705008 شماره

صفحات -

تاریخ انتشار 1997

The TreeBanker: a Tool for Supervised Training of Parsed Corpora

نویسنده

چکیده

منابع مشابه

Journées ATALA, 18–19 juin 1999, Corpus annotés pour la syntaxe SYNTACTIC ANNOTATION OF A GERMAN NEWSPAPER CORPUS

Syntactic-Based Methods for Measuring Word Similarity

Automatic Selection of High Quality Parses Created By a Fully Unsupervised Parser

Supervised Grammar Induction using Training Data with Limited Constituent Information

Learning Graph Walk Based Similarity Measures for Parsed Text

عنوان ژورنال:

اشتراک گذاری